Discovering Knowledge from High-Dimensional Geographic Data: Integrating Visual and Computational Approaches

نویسندگان

  • Diansheng Guo
  • Mark Gahegan
  • Alan M. MacEachren
  • Donna J. Peuquet
چکیده

It has been widely recognized that spatial data analysis capabilities have not kept up with the need for analyzing the increasingly large volumes of geographic data of various themes that are currently being collected and archived (Openshaw 1991; Miller and Han 2001; Shekhar, Vatsaval et al. 2002; Guo 2003; Guo, Peuquet et al. 2003; Muntz, Barclay et al. 2003). On one hand, such a wealth of data holds great opportunities for geographers, environmental scientists, public health researchers, and others to address urgent and sophisticated geographic problems, e.g., global change, epidemics such as SARS, etc. On the other hand, existing data analysis methods fall short for the extraction of meaningful patterns from datasets of such unprecedentedly large size (in terms of the number of observations) and high dimensionality (in terms of the number of variables). Data mining and knowledge discovery refers to the overall process of discovering useful knowledge from data, which generally involves data selection, data pre-processing, data transformation, incorporation of appropriate prior knowledge, data mining, and proper interpretation of the results (Fayyad, Piatetsky-Shapiro et al. 1996). While data mining and KDD research has been widely conducted in areas of business, bioinformatics, text mining, etc., it is still at a very early stage in geographic domains. Geography is an integrative discipline and geographic data under analysis often span across multiple domains. The complexity of spatial data and geographic problems, together with intrinsic spatial relationships, constitute an enormous challenge to conventional data mining methods and call for both theoretical research and development of new techniques to assist in deriving information from large and heterogeneous spatial datasets (Han and Kamber 2001; Miller and Han 2001; Gahegan and Brodaric 2002).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coordinating computational and visual approaches for interactive feature selection and multivariate clustering

Received: KK Revised: KK Accepted: KK Abstract Unknown (and unexpected) multivariate patterns lurking in high-dimensional datasets are often very hard to find. This paper describes a human-centered exploration environment, which incorporates a coordinated suite of computational and visualization methods to explore high-dimensional data for uncovering patterns in multivariate spaces. Specificall...

متن کامل

Methods for regression analysis in high-dimensional data

By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...

متن کامل

A Model for Tax Evasion Forcasting based on ID3 Algorithm and Bayesian Network

Nowadays, knowledge is a valuable and strategic source as well as an asset for evaluation and forecasting. Presenting these strategies in discovering corporate tax evasion has become an important topic today and various solutions have been proposed. In the past, various approaches to identify tax evasion and the like have been presented, but these methods have not been very accurate and the ove...

متن کامل

ICEAGE: Interactive Clustering and Exploration of Large and High-Dimensional Geodata

The unprecedented large size and high dimensionality of existing geographic datasets make the complex patterns that potentially lurk in the data hard to ®nd. Clustering is one of the most important techniques for geographic knowledge discovery. However, existing clustering methods have two severe drawbacks for this purpose. First, spatial clustering methods focus on the speci®c characteristics ...

متن کامل

An interactive visual testbed system for dimension reduction and clustering of large-scale high-dimensional data

Many of the modern data sets such as text and image data can be represented in high-dimensional vector spaces and have benefited from computational methods that utilize advanced computational methods. Visual analytics approaches have contributed greatly to data understanding and analysis due to their capability of leveraging humans’ ability for quick visual perception. However, visual analytics...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003